3  Application in Cox PH models

Comparison of coxph versus svycoxph after multiple imputation and propensity score matching

Author

Janick Weberpals, RPh, PhD

Published

September 22, 2024

In Chapter 3 we illustrate a reproducible example on how to use coxph (survival package (Therneau 2024)) and svycoxph (survey package (Lumley 2024)) in combination with multiple imputation by chained equations (mice package (Buuren and Groothuis-Oudshoorn 2011)) and propensity score matching using the MatchThem package (Pishgar et al. 2021).

First, we load the required R libraries/packages and some custom functions that are part of the encore.io R package that is being developed to streamline the analysis of all ENCORE trial emulations (non-public package).

library(dplyr)
library(survival)
library(mice)
library(MatchThem)
library(survey)
library(here)
library(gtsummary)
library(parallelly)
library(ranger)
library(furrr)
library(cobalt)
library(gsDesign)

source(here("functions", "source_encore.io_functions.R"))

# track time
runtime <- tictoc::tic()

3.1 Data generation

We use the simulate_flaura() function to simulate a realistic oncology comparative effectiveness analytic cohort dataset with similar distributions to FLAURA, a randomized controlled trial that evaluated the efficacy and safety of osimertinib to standard-of-care (SoC) tyrosine kinase inhibitors (TKIs) in advanced NSCLC patients with a sensitizing EGFR mutation.

The following cohort resembles distributions observed in the EHR-derived EDB1dataset used in ENCORE. Note: the values of some continuous covariates (labs) are displayed after log/log-log transformation.

# load example dataset with missing observations
data_miss <- simulate_flaura(
  n_total = 3500, 
  seed = 42, 
  include_id = FALSE, 
  imposeNA = TRUE,
  propNA = .33
  )

# crate Table 1
data_miss |> 
  tbl_summary(
    by = "treat", 
    include = table1_covariates$covariate,
    label = table1_labels
    ) |> 
  add_overall() |> 
  modify_header(
    label ~ "**Patient characteristic**",
    stat_0 ~ "**Total** <br> N = {N}",
    stat_1 ~ "**Comparator** <br> N = {n} <br> ({style_percent(p, digits=1)}%)",
    stat_2 ~ "**Exposure** <br> N = {n} <br> ({style_percent(p, digits=1)}%)"
    ) |> 
  modify_spanning_header(c("stat_1", "stat_2") ~ "**Treatment received**") |> 
  modify_caption("**Table 1. Patient Characteristics**")
Table 1. Patient Characteristics

Patient characteristic

Total
N = 3500

1

Treatment received

Comparator
N = 1487
(42.5%)

1

Exposure
N = 2013
(57.5%)

1
Age at index date 69 (64, 74) 69 (64, 74) 69 (64, 74)
Sex 1,146 (33%) 494 (33%) 652 (32%)
Race


    Asian 835 (36%) 347 (35%) 488 (36%)
    Other 54 (2.3%) 22 (2.2%) 32 (2.4%)
    White 1,449 (62%) 625 (63%) 824 (61%)
    Unknown 1,162 493 669
Smoking history 1,033 (44%) 482 (48%) 551 (41%)
    Unknown 1,162 493 669
ECOG 1,327 (57%) 583 (59%) 744 (55%)
    Unknown 1,162 493 669
Index year


    <2018 3,324 (95%) 1,400 (94%) 1,924 (96%)
    2018+ 176 (5.0%) 87 (5.9%) 89 (4.4%)
1

Median (Q1, Q3); n (%)

3.2 Step 1 - Multiple imputation

The first step after deriving the analytic cohort includes the creation of multiple imputed datasets using mice R package(Buuren and Groothuis-Oudshoorn 2011).

The mice algorithm is one particular instance of a fully conditionally specified model. The algorithm starts with a random draw from the observed data, and imputes the incomplete data in a variable-by-variable fashion. One iteration consists of one cycle through all \(Y_j\).

MICE algorithm for imputation of multivariate missing data.

MICE algorithm for imputation of multivariate missing data.

The number of iterations \(M\) (= number of imputed datasets) in this example is 10, but in ENCORE we follow Stef van Buuren’s advice:

[…] if calculation is not prohibitive, we may set \(M\) to the average percentage of missing data.

(Flexible imputation of Missing Data, Sub-chapter 2.8)

Following the results of various simulation studies (Shah et al. 2014; Weberpals et al. 2024), we use a non-parametric (random forest-based) imputation approach as the actual imputation algorithm.

Advantages of non-parametric imputation approaches
  • Parametric imputation models have to be correctly specified, i.e. also have to explicitly model nonlinear and non-additive covariate relationships

  • Many imputation algorithms are not prepared for mixed type of data

  • Popular: random forest-based algorithms

Note: In this example we utilize the futuremice() instead of the legacy mice() function to run the mice imputation across 3 cores in parallel.

# impute data
data_imp <- futuremice(
  parallelseed = 42,
  n.core = parallel::detectCores()-1,
  data = data_miss,
  method = "rf",
  m = 10,
  print = FALSE
  )

The imputation step creates an object of class…

class(data_imp)
[1] "mids"

…which stands for multiple imputed datasets. It contains important information on the imputation procedure and the actual imputed datasets.

3.3 Step 2 - Propensity score matching and weighting

Apply propensity score matching and weighting with replacement within in each imputed dataset. As pointed in Section 2.3.1, the MIte approach performed best in terms of bias, standardized differences/balancing, coverage rate and variance estimation. In MatchThem this approach is referred to a within approach (performing matching within each dataset), while the inferior MIps approach (estimating propensity scores within each dataset, averaging them across datasets, and performing matching using the averaged propensity scores in each dataset) is referred to as across approach. Since MIte/within has been shown to have superior performance in most cases, we only illustrate this approach here.

Let’s assume we fit the following propensity score model within each imputed dataset.

# apply propensity score matching on mids object
ps_form <- as.formula(paste("treat ~", paste(covariates_for_ps, collapse = " + ")))
ps_form
treat ~ dem_age_index_cont + dem_sex_cont + c_smoking_history + 
    c_number_met_sites + c_hemoglobin_g_dl_cont + c_urea_nitrogen_mg_dl_cont + 
    c_platelets_10_9_l_cont + c_calcium_mg_dl_cont + c_glucose_mg_dl_cont + 
    c_lymphocyte_leukocyte_ratio_cont + c_alp_u_l_cont + c_protein_g_l_cont + 
    c_alt_u_l_cont + c_albumin_g_l_cont + c_bilirubin_mg_dl_cont + 
    c_chloride_mmol_l_cont + c_monocytes_10_9_l_cont + c_eosinophils_leukocytes_ratio_cont + 
    c_ldh_u_l_cont + c_hr_cont + c_sbp_cont + c_oxygen_cont + 
    c_ecog_cont + c_neutrophil_lymphocyte_ratio_cont + c_bmi_cont + 
    c_ast_alt_ratio_cont + c_stage_initial_dx_cont + dem_race + 
    dem_region + dem_ses + c_time_dx_to_index

The matching step happens using the matchthem() function, which is a wrapper around the matchit() function. This function not only provides the functionality to match on the propensity score, but also to perform (coarsened) exact matching, cardinality matching, genetic matching and more. In this example, we use a simple 1:1 nearest neighbor matching on the propensity score (estimated through logistic regression) without replacement with a caliper of 1% of the standard deviation of the propensity score.

# matching
mimids_data <- matchthem(
  formula = ps_form,
  datasets = data_imp,
  approach = 'within',
  method = 'nearest',
  distance = "glm",
  link = "logit",
  caliper = 0.01,
  ratio = 1,
  replace = F
  )

# print summary for matched dataset #1
mimids_data
A matchit object
 - method: 1:1 nearest neighbor matching without replacement
 - distance: Propensity score [caliper]
             - estimated with logistic regression
 - caliper: <distance> (0.001)
 - number of obs.: 3500 (original), 2660 (matched)
 - target estimand: ATT
 - covariates: dem_age_index_cont, dem_sex_cont, c_smoking_history, c_number_met_sites, c_hemoglobin_g_dl_cont, c_urea_nitrogen_mg_dl_cont, c_platelets_10_9_l_cont, c_calcium_mg_dl_cont, c_glucose_mg_dl_cont, c_lymphocyte_leukocyte_ratio_cont, c_alp_u_l_cont, c_protein_g_l_cont, c_alt_u_l_cont, c_albumin_g_l_cont, c_bilirubin_mg_dl_cont, c_chloride_mmol_l_cont, c_monocytes_10_9_l_cont, c_eosinophils_leukocytes_ratio_cont, c_ldh_u_l_cont, c_hr_cont, c_sbp_cont, c_oxygen_cont, c_ecog_cont, c_neutrophil_lymphocyte_ratio_cont, c_bmi_cont, c_ast_alt_ratio_cont, c_stage_initial_dx_cont, dem_race, dem_region, dem_ses, c_time_dx_to_index

The resulting “mimids” object contains the original imputed data and the output of the calls to matchit() applied to each imputed dataset.

The weighting step is performed very similarly using the weightthem() function. In this example weapply SMR weighting to arrive at the same ATT estimand as matching which is indicated through the estimand = "ATT" argument. In case we wanted to weight patients based on overlap weights, estimand = "AT0" would need to be specified (which is one of the sensitivity analyses in the FLAURA protocol).

To mitigate the risks of extreme weights, the subsequent trim() function truncates large weights by setting all weights higher than that at a given quantile (in this example the 95% quantile) to the weight at the quantile. Since we specify lower = TRUE, this is done symmetrically also with the 5% quantile.

# SMR weighting
wimids_data <- weightthem(
  formula = ps_form,
  datasets = data_imp,
  approach = 'within',
  method = "glm",
  estimand = "ATT"
  )

# trim extreme weights
wimids_data <- trim(
  x = wimids_data, 
  at = .95, 
  lower = TRUE
  )

wimids_data
A weightit object
 - method: "glm" (propensity score weighting with GLM)
 - number of obs.: 3500
 - sampling weights: none
 - treatment: 2-category
 - estimand: ATT (focal: 1)
 - covariates: dem_age_index_cont, dem_sex_cont, c_smoking_history, c_number_met_sites, c_hemoglobin_g_dl_cont, c_urea_nitrogen_mg_dl_cont, c_platelets_10_9_l_cont, c_calcium_mg_dl_cont, c_glucose_mg_dl_cont, c_lymphocyte_leukocyte_ratio_cont, c_alp_u_l_cont, c_protein_g_l_cont, c_alt_u_l_cont, c_albumin_g_l_cont, c_bilirubin_mg_dl_cont, c_chloride_mmol_l_cont, c_monocytes_10_9_l_cont, c_eosinophils_leukocytes_ratio_cont, c_ldh_u_l_cont, c_hr_cont, c_sbp_cont, c_oxygen_cont, c_ecog_cont, c_neutrophil_lymphocyte_ratio_cont, c_bmi_cont, c_ast_alt_ratio_cont, c_stage_initial_dx_cont, dem_race, dem_region, dem_ses, c_time_dx_to_index
 - weights trimmed at 5% and 95%

The resulting “wimids” object contains the original imputed data and the output of the calls to weightit() applied to each imputed dataset.

3.4 Step 3 - Balance assessment

The inspection of balance assessment in multiple imputed and matched/weighted data can be done in a similar way as with a single complete dataset. For illustration we just look at the matched datasets, but the exact same principles also apply to the weighted datasets.

Table 3.1: Covariate balance table.
# create balance table
balance_table <- bal.tab(
  x = mimids_data, 
  stats = "m",
  abs = TRUE
  )

balance_table
Balance summary across all imputations
                                        Type Mean.Diff.Adj Max.Diff.Adj
distance                            Distance        0.0053       0.0056
dem_age_index_cont                   Contin.        0.0164       0.0523
dem_sex_cont                          Binary        0.0058       0.0135
c_smoking_history                     Binary        0.0065       0.0171
c_number_met_sites                   Contin.        0.0080       0.0197
c_hemoglobin_g_dl_cont               Contin.        0.0201       0.0328
c_urea_nitrogen_mg_dl_cont           Contin.        0.0144       0.0302
c_platelets_10_9_l_cont              Contin.        0.0150       0.0331
c_calcium_mg_dl_cont                 Contin.        0.0195       0.0425
c_glucose_mg_dl_cont                 Contin.        0.0130       0.0378
c_lymphocyte_leukocyte_ratio_cont    Contin.        0.0218       0.0646
c_alp_u_l_cont                       Contin.        0.0294       0.0536
c_protein_g_l_cont                   Contin.        0.0108       0.0309
c_alt_u_l_cont                       Contin.        0.0071       0.0176
c_albumin_g_l_cont                   Contin.        0.0104       0.0286
c_bilirubin_mg_dl_cont               Contin.        0.0100       0.0264
c_chloride_mmol_l_cont               Contin.        0.0131       0.0287
c_monocytes_10_9_l_cont              Contin.        0.0141       0.0456
c_eosinophils_leukocytes_ratio_cont  Contin.        0.0127       0.0270
c_ldh_u_l_cont                       Contin.        0.0169       0.0327
c_hr_cont                            Contin.        0.0202       0.0414
c_sbp_cont                           Contin.        0.0169       0.0493
c_oxygen_cont                        Contin.        0.0175       0.0487
c_ecog_cont                           Binary        0.0053       0.0112
c_neutrophil_lymphocyte_ratio_cont   Contin.        0.0079       0.0180
c_bmi_cont                           Contin.        0.0145       0.0370
c_ast_alt_ratio_cont                 Contin.        0.0179       0.0334
c_stage_initial_dx_cont              Contin.        0.0204       0.0456
dem_race_Asian                        Binary        0.0101       0.0183
dem_race_Other                        Binary        0.0012       0.0037
dem_race_White                        Binary        0.0104       0.0220
dem_region_Midwest                    Binary        0.0060       0.0095
dem_region_Northeast                  Binary        0.0040       0.0090
dem_region_South                      Binary        0.0070       0.0147
dem_region_West                       Binary        0.0073       0.0165
dem_ses                              Contin.        0.0180       0.0402
c_time_dx_to_index                   Contin.        0.0102       0.0400

Average sample sizes across imputations
               0      1
All       1487.  2013. 
Matched   1350.1 1350.1
Unmatched  136.9  662.9
love.plot(
  x = mimids_data,
  abs = TRUE,
  thresholds = 0.1, 
  drop.distance = TRUE,
  var.order = "unadjusted",
  colors = c("orange", "blue"), 
  stars = "std",
  shapes = 17, 
  size = 4, 
  grid = TRUE,
  position = "top"
  )
Figure 3.1: Covariate balance plot (love plot).
bal.plot(
  x = mimids_data,
  var.name = "distance",
  which = "both",
  which.imp = .none,
  colors = c("orange", "blue")
  )

For power calculations, we use the method proposed by Schoenfeld (Schoenfeld 1983) to compute 1 - type II error rate \(\beta\) . For this, we assume the following:

  • \(\alpha\) = 0.05 (two-sided)

  • % exposed (1:1 matching in main analysis) = 50%

  • HR (desired) = 0.8

  • Events = as observed in data

Since we have multiple imputed and matched datasets, we need to average the number of events before before computing \(\beta\).

# make long dataset
data_long <- MatchThem::complete(
  # datasets
  data = mimids_data, 
  # produces a dataset with multiply imputed datasets stacked vertically
  action = "long", 
  # do NOT include observations with a zero estimated weight (non-matched)
  all = FALSE, 
  # do NOT include original dataset with missing values
  include = FALSE
  )

# compute average number of events
# by summing up all events
# and dividing by number of imputed datasets
avg_events <- sum(data_long$death_itt)/mimids_data$object$m

# compute beta
beta_gsDesign <- nEvents(
  alpha = 0.05, 
  sided = 2,
  n = avg_events,
  hr = .8,
  ratio = 1,
  tbl = TRUE
  )

# print results
cat("beta is", beta_gsDesign$beta, "\n")
beta is 7.075859e-05 
cat("power is", (1-beta_gsDesign$beta)*100, "% \n")
power is 99.99292 % 
# gsDesign table
beta_gsDesign
   hr      n alpha sided         beta     Power     delta ratio hr0         se
1 0.8 2670.3  0.05     2 7.075859e-05 0.9999292 0.1115718     1   1 0.03870348

3.5 Step 4 - Estimation of marginal treatment effects

Next, we compare the marginal treatment effect estimates coming from a Cox proportional hazards model after propensity score matching and weighting as implemented in the coxph() and in the svycoxph() functions.

From the MatchThem documentation:

Important
  • with() applies the supplied model in expr to the (matched or weighted) multiply imputed datasets, automatically incorporating the (matching) weights when possible. The argument to expr should be of the form glm(y ~ z, family = quasibinomial), for example, excluding the data or weights argument, which are automatically supplied.

  • Functions from the survey package, such as svyglm(), are treated a bit differently. No svydesign object needs to be supplied because with() automatically constructs and supplies it with the imputed dataset and estimated weights. When cluster = TRUE (or with() detects that pairs should be clustered; see the cluster argument above), pair membership is supplied to the ids argument of svydesign().

  • After weighting using weightthem(), glm_weightit() should be used as the modeling function to fit generalized linear models. It correctly produces robust standard errors that account for estimation of the weights, if possible. See WeightIt::glm_weightit() for details. Otherwise, svyglm() should be used rather than glm() in order to correctly compute standard errors.

  • For Cox models, coxph() will produce approximately correct standard errors when used with weighting, but svycoxph() will produce more accurate standard errors when matching is used.

We now want to compare treatment effect estimates for treat when computed (a) using coxph (survival package) and (b) svycoxph (survey package). More information on estimating treatment effects after matching is provided in https://kosukeimai.github.io/MatchIt/articles/estimating-effects.html#survival-outcomes

3.5.0.1 coxph

# coxph result
coxph_results <- with(
  data = mimids_data,
  expr = coxph(formula = Surv(fu_itt_months, death_itt) ~ treat, 
               weights = weights, 
               cluster = subclass,
               robust = TRUE
               )
  ) |> 
  pool() |> 
  tidy(exponentiate = TRUE, conf.int = TRUE) |> 
  mutate(package = "survival") |> 
  select(package, term, estimate, std.error, conf.low, conf.high) 

coxph_results

3.5.0.2 svycoxph

# svycoxph result
svycoxph_results <- with(
  data = mimids_data,
  expr = svycoxph(formula = Surv(fu_itt_months, death_itt) ~ treat),
  cluster = TRUE
  ) |> 
  pool() |> 
  tidy(exponentiate = TRUE, conf.int = TRUE) |> 
  mutate(package = "survey") |> 
  select(package, term, estimate, std.error, conf.low, conf.high)

svycoxph_results

3.5.0.3 Summary

rbind(coxph_results, svycoxph_results)
   package  term  estimate  std.error  conf.low conf.high
1 survival treat 0.7069473 0.04325520 0.6491616 0.7698769
2   survey treat 0.7069473 0.04326807 0.6491773 0.7698583

3.5.0.4 coxph

# coxph result
coxph_results <- with(
  data = wimids_data,
  expr = coxph(formula = Surv(fu_itt_months, death_itt) ~ treat,
               weights = weights, 
               robust = TRUE
               )
  ) |> 
  pool() |> 
  tidy(exponentiate = TRUE, conf.int = TRUE) |> 
  mutate(package = "survival") |> 
  select(package, term, estimate, std.error, conf.low, conf.high) 

coxph_results

3.5.0.5 svycoxph

# svycoxph result
svycoxph_results <- with(
  data = wimids_data,
  expr = svycoxph(formula = Surv(fu_itt_months, death_itt) ~ treat),
  cluster = TRUE
  ) |> 
  pool() |> 
  tidy(exponentiate = TRUE, conf.int = TRUE) |> 
  mutate(package = "survey") |> 
  select(package, term, estimate, std.error, conf.low, conf.high) 

svycoxph_results

3.5.0.6 Summary

rbind(coxph_results, svycoxph_results)
   package  term  estimate  std.error  conf.low conf.high
1 survival treat 0.7061904 0.03553485 0.6586554 0.7571561
2   survey treat 0.7061904 0.03553974 0.6586657 0.7571442

3.6 Session info

Script runtime: 0.99 minutes.

pander::pander(subset(data.frame(sessioninfo::package_info()), attached==TRUE, c(package, loadedversion)))
  package loadedversion
cobalt cobalt 4.5.5
dplyr dplyr 1.1.4
furrr furrr 0.3.1
future future 1.34.0
gsDesign gsDesign 3.6.4
gtsummary gtsummary 2.0.1
here here 1.0.1
MatchThem MatchThem 1.2.1
Matrix Matrix 1.7-0
mice mice 3.16.0
parallelly parallelly 1.38.0
ranger ranger 0.16.0
survey survey 4.4-2
survival survival 3.5-8
pander::pander(sessionInfo())

R version 4.4.0 (2024-04-24)

Platform: x86_64-pc-linux-gnu

locale: LC_CTYPE=C.UTF-8, LC_NUMERIC=C, LC_TIME=C.UTF-8, LC_COLLATE=C.UTF-8, LC_MONETARY=C.UTF-8, LC_MESSAGES=C.UTF-8, LC_PAPER=C.UTF-8, LC_NAME=C, LC_ADDRESS=C, LC_TELEPHONE=C, LC_MEASUREMENT=C.UTF-8 and LC_IDENTIFICATION=C

attached base packages: grid, stats, graphics, grDevices, datasets, utils, methods and base

other attached packages: gsDesign(v.3.6.4), cobalt(v.4.5.5), furrr(v.0.3.1), future(v.1.34.0), ranger(v.0.16.0), parallelly(v.1.38.0), gtsummary(v.2.0.1), here(v.1.0.1), survey(v.4.4-2), Matrix(v.1.7-0), MatchThem(v.1.2.1), mice(v.3.16.0), survival(v.3.5-8) and dplyr(v.1.1.4)

loaded via a namespace (and not attached): tidyselect(v.1.2.1), farver(v.2.1.2), fastmap(v.1.2.0), digest(v.0.6.37), rpart(v.4.1.23), lifecycle(v.1.0.4), magrittr(v.2.0.3), compiler(v.4.4.0), rlang(v.1.1.4), sass(v.0.4.9), tools(v.4.4.0), utf8(v.1.2.4), yaml(v.2.3.10), gt(v.0.11.0), knitr(v.1.48), labeling(v.0.4.3), htmlwidgets(v.1.6.4), xml2(v.1.3.6), r2rtf(v.1.1.1), withr(v.3.0.1), purrr(v.1.0.2), nnet(v.7.3-19), fansi(v.1.0.6), jomo(v.2.7-6), xtable(v.1.8-4), colorspace(v.2.1-1), ggplot2(v.3.5.1), globals(v.0.16.3), scales(v.1.3.0), iterators(v.1.0.14), MASS(v.7.3-60.2), cli(v.3.6.3), rmarkdown(v.2.28), crayon(v.1.5.3), generics(v.0.1.3), sessioninfo(v.1.2.2), commonmark(v.1.9.1), minqa(v.1.2.8), DBI(v.1.2.3), pander(v.0.6.5), stringr(v.1.5.1), splines(v.4.4.0), assertthat(v.0.2.1), parallel(v.4.4.0), base64enc(v.0.1-3), mitools(v.2.4), vctrs(v.0.6.5), WeightIt(v.1.3.0), boot(v.1.3-30), glmnet(v.4.1-8), jsonlite(v.1.8.8), mitml(v.0.4-5), listenv(v.0.9.1), locfit(v.1.5-9.10), foreach(v.1.5.2), tidyr(v.1.3.1), glue(v.1.7.0), nloptr(v.2.1.1), pan(v.1.9), chk(v.0.9.2), codetools(v.0.2-20), stringi(v.1.8.4), shape(v.1.4.6.1), gtable(v.0.3.5), lme4(v.1.1-35.5), munsell(v.0.5.1), tibble(v.3.2.1), pillar(v.1.9.0), htmltools(v.0.5.8.1), R6(v.2.5.1), rprojroot(v.2.0.4), evaluate(v.0.24.0), lattice(v.0.22-6), markdown(v.1.13), backports(v.1.5.0), cards(v.0.2.1), tictoc(v.1.2.1), MatchIt(v.4.5.5), broom(v.1.0.6), renv(v.1.0.7), simsurv(v.1.0.0), Rcpp(v.1.0.13), nlme(v.3.1-164), xfun(v.0.47) and pkgconfig(v.2.0.3)

pander::pander(options('repos'))
  • repos:

    REPO_NAME
    https://packagemanager.posit.co/cran/linux/noble/latest

3.7